llama : fix qs.n_attention_wv for DeepSeek-V2 #9156

compilade · 2024-08-24T14:32:38Z

Should fix #9155

This previously (before #8526) did not trigger the assertion because a value of 0 was accepted for recurrent models, but DeepSeek-V2(-Lite) is not a recurrent model.

Counting either attn_kv_a_mqa.weight or attn_kv_b.weight should fix this, but I went with the shorter of the two to fit vertically with the other conditions in the if which counts those tensors.

@mann1x can you confirm whether or not this fixes the problem?

I have read the contributing guidelines
Self-reported review complexity:
- Low

mann1x · 2024-08-25T21:06:13Z

@compilade
Sorry, must have seen the notification while I was already sleeping yesterday... was unread but I have no recollection of it.

Tested it and works, was able to quantize and run the model on ollama.

mann1x · 2024-08-27T10:08:20Z

@ggerganov
Can you check the tests which are failing?
Seems there's a problem with the CI pipeline, not the PR.

llama : fix qs.n_attention_wv for DeepSeek-V2

f5f4cde

compilade added bugfix fixes an issue or bug Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels Aug 24, 2024

ggerganov approved these changes Aug 26, 2024

View reviewed changes

ggerganov merged commit 78eb487 into master Aug 27, 2024
50 of 53 checks passed

ggerganov deleted the compilade/fix-deepseek-n_wv branch August 27, 2024 10:09

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

llama : fix qs.n_attention_wv for DeepSeek-V2 (ggerganov#9156)

ab939b6

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

llama : fix qs.n_attention_wv for DeepSeek-V2 (ggerganov#9156)

7f15cdc

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

llama : fix qs.n_attention_wv for DeepSeek-V2 (ggerganov#9156)

24f0dbb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : fix qs.n_attention_wv for DeepSeek-V2 #9156

llama : fix qs.n_attention_wv for DeepSeek-V2 #9156

compilade commented Aug 24, 2024

mann1x commented Aug 25, 2024

mann1x commented Aug 27, 2024

llama : fix qs.n_attention_wv for DeepSeek-V2 #9156

llama : fix qs.n_attention_wv for DeepSeek-V2 #9156

Conversation

compilade commented Aug 24, 2024

mann1x commented Aug 25, 2024

mann1x commented Aug 27, 2024